recognition behavior
Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors
Kawaharazuka, Kento, Obinata, Yoshiki, Kanazawa, Naoaki, Okada, Kei, Inaba, Masayuki
In recent years, a number of models that learn the relations between vision and language from large datasets have been released. These models perform a variety of tasks, such as answering questions about images, retrieving sentences that best correspond to images, and finding regions in images that correspond to phrases. Although there are some examples, the connection between these pre-trained vision-language models and robotics is still weak. If they are directly connected to robot motions, they lose their versatility due to the embodiment of the robot and the difficulty of data collection, and become inapplicable to a wide range of bodies and situations. Therefore, in this study, we categorize and summarize the methods to utilize the pre-trained vision-language models flexibly and easily in a way that the robot can understand, without directly connecting them to robot motions. We discuss how to use these models for robot motion selection and motion planning without re-training the models. We consider five types of methods to extract information understandable for robots, and show the results of state recognition, object recognition, affordance recognition, relation recognition, and anomaly detection based on the combination of these five methods. We expect that this study will add flexibility and ease-of-use, as well as new applications, to the recognition behavior of existing robots.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Data Science > Data Mining > Anomaly Detection (0.53)
A newborn embodied Turing test for view-invariant object recognition
Pak, Denizhan, Lee, Donsuk, Wood, Samantha M. W., Wood, Justin N.
Recent progress in artificial intelligence has renewed interest in building machines that learn like animals. Almost all of the work comparing learning across biological and artificial systems comes from studies where animals and machines received different training data, obscuring whether differences between animals and machines emerged from differences in learning mechanisms versus training data. We present an experimental approach-a "newborn embodied Turing Test"-that allows newborn animals and machines to be raised in the same environments and tested with the same tasks, permitting direct comparison of their learning abilities. To make this platform, we first collected controlled-rearing data from newborn chicks, then performed "digital twin" experiments in which machines were raised in virtual environments that mimicked the rearing conditions of the chicks. We found that (1) machines (deep reinforcement learning agents with intrinsic motivation) can spontaneously develop visually guided preference behavior, akin to imprinting in newborn chicks, and (2) machines are still far from newborn-level performance on object recognition tasks. Almost all of the chicks developed view-invariant object recognition, whereas the machines tended to develop view-dependent recognition. The learning outcomes were also far more constrained in the chicks versus machines. Ultimately, we anticipate that this approach will help researchers develop embodied AI systems that learn like newborn animals.